Thirty Days of Metal — Day 26: Normal Mapping

Warren Moore
8 min readMay 12, 2022

This series of posts is my attempt to present the Metal graphics programming framework in small, bite-sized chunks for Swift app developers who haven’t done GPU programming before.

If you want to work through this series in order, start here. To download the sample code for this article, go here.

Last time, we looked at how to use environment mapping to cheaply add a modicum of realism to reflective and refractive materials. This time, we consider how to create the illusion of finer surface detail without adding extra geometry, using a technique called normal mapping.

Normal Mapping

Normal mapping is a rendering technique that supplies additional apparent surface detail in the form of a texture map. In the same way that we add color information in between vertices with ordinary texture mapping, we add variation in the surface normal with a normal map.

We have been using interpolated vertex normals along the way to compute per-pixel lighting. Normal mapping recomputes the normal on a per-pixel basis according to the normals encoded in a normal map. So what does a normal map look like?

The most common normal map type has a purplish hue. Normals are perpendicular to the surface, meaning they are predominantly aligned with the local Z axis. As a vector, the local +Z direction is the vector [0 0 1]. Interpreted as a color with normalized components, this is pure blue. But the normals in a normal map need to be transformed so that negative direction components can be encoded as positive color components. This is done by adding 1 to the normal’s components and dividing the result by 2. The resulting vector can represent any unit vector by using components between 0 and 1. The purplish tinge of normal maps comes from the fact that the +Z direction is encoded to the color vector [0.5 0.5 1], which is a light purple. Normals that point in the +X direction have a red tint, while normals that point in the +Y direction have a green tint, as you might expect.

Tangent Space

To understand how to interpret these encoded normals and work with them in our shaders, we need to understand which coordinate system they are in. This is a new coordinate space called tangent space.

Unlike model space and world space, tangent space can be different at every point on a model’s surface. Specifically, tangent space is an orthonormal vector basis whose Z axis is aligned with the surface normal, and whose X and Y axes lie in the tangent plane of the surface. The tangent-space X axis is called the tangent, and the Y axis is called the bitangent.

Just knowing that tangent space is an orthonormal basis doesn’t help use chose its X and Y axes, however: there are infinitely many orthonormal bases for any choice of the Z axis. So we need additional information to determine the tangent and bitangent directions.

Fortunately, we have additional information in the form of texture coordinates. At a given point, the u and v coordinates change at a particular rate that depends on the texture mapping parameterization we chose when texture-mapping our mesh. We can use these texture-space derivatives to disambiguate tangent space, by choosing the tangent to point in the direction of most rapid change in the u coordinate, and likewise with the bitangent and the v coordinate.

The complete math for this is out of the scope of this article, but we will discuss how to go about generating normal maps in the next section.

Normal Map Generation

To generate a normal map, we start with a highly tessellated version of the source mesh, called a “high-poly” mesh. The version we will render at run-time has far fewer triangles; this is our “low-poly” mesh. The normal map essentially stores the difference in surface detail between the high-poly and low-poly mesh.

The process of normal map generation consists of constructing a tangent space basis at each texel and encoding the transformation between the coarse-grained interpolated vertex normal and the fine-grained surface normal. Most modern 3D modelers (like Maya and Blender) include features for producing normal maps from high-poly meshes; this procedure is called “normal map baking.”

It is important that the procedure used to generate a normal map is the exact inverse of the procedure used to apply it during lighting. Disparities between conventions and processing can produce unsightly artifacts, as detailed in this Ben Golus article. Fortunately, as time goes on, the MikkTSpace convention is becoming dominant across the visual effects and game communities.

Generating Tangents with Model I/O

We need per-vertex tangents to reconstruct tangent space in our shaders. Many model formats support storing vertex tangents along with vertex positions and normals. Sometimes, though, tangents are missing, either because the format doesn’t support them or because they were not exported.

We can use Model I/O to construct tangents automatically from the other data in a mesh (namely the normals and texture coordinates). After loading a mesh, we call the addTangentBasis(forTextureCoordinateAttributeNamed:normalAttributeNamed:tangentAttributeNamed:) method on MDLMesh. This method uses the existing data in the texture coordinate and normal attributes to populate the tangent attribute.

mdlMesh.addTangentBasis(
forTextureCoordinateAttributeNamed: MDLVertexAttributeTextureCoordinate,
normalAttributeNamed: MDLVertexAttributeNormal,
tangentAttributeNamed: MDLVertexAttributeTangent)

After this method, the mesh is not guaranteed to conform to the vertex descriptor which was supplied when it was loaded. We can fix this by creating a new Model I/O vertex descriptor that includes a tangent attribute laid out as we prefer:

vertexDescriptor.vertexAttributes[0].name = 
MDLVertexAttributePosition
vertexDescriptor.vertexAttributes[0].format = .float3
vertexDescriptor.vertexAttributes[0].offset = 0
vertexDescriptor.vertexAttributes[0].bufferIndex = 0
vertexDescriptor.vertexAttributes[1].name =
MDLVertexAttributeNormal
vertexDescriptor.vertexAttributes[1].format = .float3
vertexDescriptor.vertexAttributes[1].offset = 12
vertexDescriptor.vertexAttributes[1].bufferIndex = 0
vertexDescriptor.vertexAttributes[2].name =
MDLVertexAttributeTangent
vertexDescriptor.vertexAttributes[2].format = .float4
vertexDescriptor.vertexAttributes[2].offset = 24
vertexDescriptor.vertexAttributes[2].bufferIndex = 0
vertexDescriptor.vertexAttributes[3].name =
MDLVertexAttributeTextureCoordinate
vertexDescriptor.vertexAttributes[3].format = .float2
vertexDescriptor.vertexAttributes[3].offset = 40
vertexDescriptor.vertexAttributes[3].bufferIndex = 0
vertexDescriptor.bufferLayouts[0].stride = 48

We then re-set the mesh’s vertex descriptor, causing it to lay out the vertex data as expected:

mdlMesh.vertexDescriptor = vertexDescriptor

We also change the input type for our vertex function to include the tangent:

struct VertexIn {
float3 position [[attribute(0)]];
float3 normal [[attribute(1)]];
float4 tangent [[attribute(2)]];
float2 texCoords [[attribute(3)]];
};

Note that tangent is a four-element vector rather than a three-element vector like position and normal. This is because the fourth element of the tangent encodes the handedness of the tangent basis: a value of 1 indicates that it is right-handed, while a value of -1 indicates that it is left-handed. We use this to correctly reconstruct tangent space in our shaders.

Tangent Space and Texture Space

Speaking of handedness, you may have noticed that the tangent space illustrated above is right-handed, while Metal’s texture space is left-handed (with v increasing downwards). How do we account for this?

Separate from all other tangent space conventions, we have to consider the texture space convention of our API. Normal maps are commonly said to use the “OpenGL convention,” which uses a lower-left origin and upward-increasing V axis, or the “DirectX convention,” which uses an upper-left origin and downward-increasing V axis. This difference primarily manifests as OpenGL-style normal maps seeming to be lit “from above” with green highlights, while DirectX-style maps seem lit “from below.”

Metal’s convention matches DirectX, so we either need to use normal maps that already agree with this convention, or invert the green channel when sampling. The sample code expects Direct-X style normal maps.

Implementing Normal Mapping in Metal

We will continue to perform lighting in view space, since that allows us to offload work to the CPU and vertex shader while retaining maximum precision.

To construct tangent space in the fragment shader, we need to pass the view-space normal and tangent out of our vertex function. We update the output structure as follows:

struct VertexOut {
float4 position [[position]];
float3 viewPosition;
float3 normal;
float3 tangent;
float tangentSign [[flat]];
float2 texCoords;
};

Note that we also carry the tangent sign into the fragment shader so we can flip the tangent basis if required.

We also slightly update the InstanceConstants structure to take a 3x3 matrix that stores the inverse transpose of the model-view matrix. (This is the correct matrix to use to properly transform vectors into view space; we’ve been neglecting it so far, since it frequently has the same effect as transforming by the model-view matrix itself.)

struct InstanceConstants {
float4x4 modelMatrix;
float3x3 normalMatrix;
};

The body of the vertex function proceeds in the ordinary way, with the addition of the view-space tangent computation:

float4 worldPosition = instance.modelMatrix * 
float4(in.position, 1.0);
float4 viewPosition = frame.viewMatrix * worldPosition;
VertexOut out;
out.position = frame.projectionMatrix * viewPosition;
out.viewPosition = viewPosition.xyz;
out.normal = normalize(instance.normalMatrix * in.normal);
out.tangent = normalize(instance.normalMatrix * in.tangent.xyz);
out.tangentSign = in.tangent.w;
out.texCoords = in.texCoords;

In the fragment shader, our first goal is to construct a rotation matrix that transforms from tangent space to view space. This is commonly called a “TBN” matrix, since its columns are the tangent, bitangent, and normal, respectively.

float3 T = normalize(in.tangent);
float3 B = cross(in.normal, in.tangent) * in.tangentSign;
float3 Nv = normalize(in.normal);
float3x3 TBN = { T, B, Nv };

Note that instead of passing in the bitangent, we reconstruct it as the cross product of the normal and tangent (flipping it with the tangent’s sign if needed). Nv is the normalized view-space normal, which is what we’d use for lighting calculations if we weren’t normal-mapping.

To decode the tangent-space normal, we first sample the normal map, then apply the inverse of the encoding calculation:

float3 Nt = normalTexture.sample(linearSampler, in.texCoords).xyz;
Nt = Nt * 2.0 - 1.0;

To get our lighting normal, we transform the tangent-space normal into view space:

float3 N = TBN * Nt;

The remainder of our lighting calculations remain unchanged.

The sample app includes a scene with a couple of different normal-mapped materials. The figure below illustrates the effect of normal mapping, which looks even more convincing in motion.

The illusion produced by normal mapping is far from perfect. When viewed in silhouette and at oblique angles, the low-poly nature of the normal-mapped mesh becomes apparent. Other techniques like parallax mapping and displacement mapping can alleviate this somewhat, with each approach carrying its own computational cost.

Next time we will look at tessellation, a method for generating additional geometry on the fly. Tessellation can be used in concert with other techniques like displacement mapping to add geometric detail without a proportionate increase in vertex attribute memory.

Warren Moore

Real-time graphics engineer based in San Francisco, CA.